Grouping methods for ongoing record linkage

نویسندگان

  • Sean M. Randall
  • James H. Boyd
  • Anna M. Ferrante
  • Adrian P. Brown
  • James B. Semmens
چکیده

The grouping of record-pairs to determine which records belong to the same individual is an important part of the record linkage process. While a merge grouping approach is commonly used, other methods may be more appropriate when linking to a repository of previously linked data. In this paper, we applied a number of grouping strategies to three large scale hospital datasets (comprising around 27 million records), each with a known truth set. These datasets were linked against a created ‘repository’ whose quality was varied. Experimental results show that alternate grouping methods can yield very large benefits in linkage quality, especially when the quality of the underlying repository is high. Best link methods can remove between 25-90% of matching errors, depending on the characteristics of the underlying datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

A Grid and Cloud Based System for Data Grouping Computation and Online Service

Record linkage deals with finding records that identify the same real world entity, such as an individual or a business, from a given file or set of files. Record linkage problem is also referred to as the entity resolution or record recognition problem. To locate those records identifying the same real world entity, in principle, pairwise record analyses have to be performed among all records....

متن کامل

Validation of de-identified record linkage to ascertain hospital admissions in a cohort study

BACKGROUND Cohort studies can provide valuable evidence of cause and effect relationships but are subject to loss of participants over time, limiting the validity of findings. Computerised record linkage offers a passive and ongoing method of obtaining health outcomes from existing routinely collected data sources. However, the quality of record linkage is reliant upon the availability and accu...

متن کامل

Scalable Event-Based Clustering of Social Media Via Record Linkage Techniques

We tackle the problem of grouping content available in social media applications such as Flickr, Youtube, Panoramino etc. into clusters of documents describing the same event. This task has been referred to as event identification before. We present a new formalization of the event identification task as a record linkage problem and show that this formulation leads to a principled and highly ef...

متن کامل

Blocking Methods Applied to Casualty Records from the Syrian Conflict

Estimation of death counts and associated standard errors is of great importance in armed conflict such as the ongoing violence in Syria, as well as historical conflicts in Guatemala, Perú, Colombia, Timor Leste, and Kosovo. For example, statistical estimates of death counts were cited as important evidence in the trial of General Efráın Ŕıos Montt for acts of genocide in Guatemala. Estimation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015